2 Probability Basics

#LawofTotalProbability ##BayesFormula #TailSumFormula

1 Random Variable and Distribution

Problem of the Day: There are five balls in an urn,

1, 3, 5, 7, 9

, in a box. Draw a boll u.a.r. Record the number of the ball; return the ball; repeat

n

times. Let

S_{n}

be the sum over observed numbers. Find

P (5 ∣ S_{n})

Denote $X_{i}$ as number from the $i$ th draw. $S_{n} = X_{1} + \dots + X_{n}$ . Let $R = {1, 3, 5, 7, 9}$ , then by law of total probability, $\begin{aligned} P (5 ∣ S_{n}) & = \sum_{a \in R} P ((5 ∣ S_{n}) | X_{n} = a) P (X_{n} = a) \\ = \frac{1}{5} \sum_{k = 0}^{4} P (S_{n - 1} = k \mod 5) = \frac{1}{5} . \end{aligned}$

Random Variable

Given a probability space $(Ω, F, P)$ . A random variable(RV) is a function $X : Ω \to R$ , s.t. $X \leq x \equiv {ω \in Ω | X (ω) \leq x} \in F$ for all $x \in R$ . (we call this $F -$ measurable)

Distribution, c.d.f

Given a RV $X$ , its (cumulative) distribution function, c.d.f $F_{X}$ is defined as $F_{X} (x) = P (X \leq x), x \in (- \infty, \infty) .$

1.1 Discrete RV

A discrete RV means $Range (X)$ is either finite or countably infinite.

Like, for $A \in F$ , indicator RV $I_{A} (ω) = {\begin{aligned} 1, ω \in A, \\ 0, ω \notin A \end{aligned}$ is a discrete RV.

It has to satisfy:

$F_{X} (x) = \sum_{y \leq x} P (X = y)$ .
$F_{X}$ is right-continuous: $lim_{y \to x^{+}} F_{X} (y) = F_{X} (x)$ .

1.2 Continuous RV

$F_{X} (x)$ is continuous, $\forall x \in R$ . Then, we consider probability density function, p.d.f $f_{X} (x) = \frac{d F_{X} (x)}{d x},$ and $F_{X} (x) = \int_{- \infty}^{x} f_{X} (y) d y .$
More generally, $P (X \in A) = \int_{A} f_{X} (x) d x .$

2 Expectation

Expectation/Mean

For a function $g : R \to R$ , define expectation $E$ .

For discrete RV, $E [g (X)] = \sum_{x \in Range (X)} g (x) P (X = x) .$
For continuous RV, $E [g (X)] = \int_{- \infty}^{\infty} g (x) f_{X} (x) d x .$

Provided that $E [| g (X) |] < \infty$ . (absolutely convergent.)

By the following theorem, we need absolute convergence to ensure $E [g (X)]$ is well defined.

Theorem (Riemann Rearrangement Theorem)

If $\sum_{i = 1}^{\infty} a_{n}$ converges but $\sum_{i = 1}^{\infty} | a_{n} |$ diverges, then for any given $r \in [- \infty, \infty]$ , $\exists$ a permutation $π$ : $\sum_{i = 1}^{\infty} a_{π (n)} = r$ .

Variance

Define variance of $X$ : $Var (X) = E [(X - E [X])]^{2}$ .

Claim (Linearity of Expectation)

Let $X_{1}, \dots, X_{n}$ be RVs defined on the same probability space $(Ω, F, P)$ and $E [X_{i}]$ are well defined. Then for all constants $c_{1}, \dots, c_{n}$ , $E [\sum_{i = 1}^{n} c_{i} X_{i}] = \sum_{i = 1}^{n} c_{i} E [X_{i}] .$

By the claim, we can compute $Var (X) = E [X^{2}] - (E [X])^{2} .$

Covariance

Let $X, Y$ be RVs on the same probability space. Then $Cov (X, Y) = E [(X - E [X]) (Y - E [Y])] = E [X Y] - E [X] E [Y] .$

Similarly, we can show

Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y) .

Theorem (Tail Sum Formula)

Let $X$ be a RV with range ${0, 1, 2, \dots}$ . Then $E [X] = \sum_{k = 1}^{\infty} P (X \geq k) .$

By definition, this is easy to prove.

3 Conditional Probability

Given that event $B$ happens, what is the probability that $A$ also happens?
We want to consider a new probability space $(B, F_{B}, P_{B})$ . How should we define $P_{B}$ so that it is consistent with $P$ ?

For all $E_{1}, E_{2} \in F$ s.t. $E_{1} \cap B \neq \emptyset, E_{2} \cap B \neq \emptyset$ , we want $\frac{P (E_{1} \cap B)}{P (E_{2} \cap B)} = \frac{P_{B} (E_{1} \cap B)}{P_{B} (E_{2} \cap B)} \Rightarrow P_{B} = c P .$
Since $P_{B} (B) = 1$ , we know $c = P (B)^{- 1}$ .

So for all $A, b \in F$ s.t. $P (B) > 0$ , define conditional probability $P (A | B) = P_{B} (A \cap B) = \frac{P (A \cap B)}{P (B)} .$
$A, B$ are independent if $P (A | B) = P (A)$ , i.e. $P (A \cap B) = P (A) P (B)$ .

Independence of Events

Events $E_{1}, \dots, E_{n}$ are independent iff for every $k = 2, \dots, n$ and every $k -$ subset ${i_{1}, \dots, i_{k}} \subset {1, \dots, n}$ , $P (⋂_{j = 1}^{k} E_{i_{j}}) = \prod_{j = 1}^{k} P (E_{i_{j}}) .$

Independence of RVs

RVs $X, Y$ on the same probability space $(Ω, F, P)$ are said to be independent ( $⊥ ⊥$ ) iff $P [(X \leq x) \cap (Y \leq y)] = P (X \leq x) P (Y \leq y), \forall x, y \in R .$
Equivalent condition:

For discrete case, $P [(X = x) \cap (Y = y)] = P (X = x) P (Y = y)$ .
For continuous case, $f_{X, Y} (x, y) = f_{X} (x) f_{Y} (y)$ .

RVs $X_{1}, \dots, X_{n}$ on the same probability space $(Ω, F, P)$ are said to be mutually independent iff $P [⋂_{i = 1}^{n} (X_{i} \leq x_{i})] = \prod_{i = 1}^{n} P (X_{i} \leq x_{i}), \forall x_{1}, \dots, x_{n} \in R .$

Mutual independence leads to pairwise independence. The other direction is generally not correct.

$X ⊥ ⊥ Y \Rightarrow E [X Y] = E [X] E [Y]$ . The other direction is generally not correct.

$X ⊥ ⊥ Y \Rightarrow Cov (X, Y) = 0$ , i.e. $\begin{matrix} (3.1) & Var (X + Y) = Var (X) + Var (Y) \end{matrix}$ , but the other direction is generally not correct.

Partition

$A_{1}, \dots, A_{n} \in F$ . $B \in F$ is a partition if

$B = A_{1} \cup \dots \cup A_{n}$ .
$A_{i} \cap A_{j} = \emptyset, \forall i \neq j$ .

Theorem (Law of Total Probability)

Suppose $B_{1}, \dots, B_{n} \in F$ is a partition of $Ω$ , s.t. $P (B_{i}) > 0, \forall i$ . then $\forall A \in F$ , $P (A) = \sum_{i = 1}^{n} P (A | B_{i}) P (B_{i}) .$

Proof

Easy to notice that $(B_{1} \cap A), \dots, (B_{n} \cap A)$ is a partition of $A$ . Then $P (A) = \sum_{i = 1}^{n} P (A \cap B_{i}) = \sum_{i = 1}^{n} P (A | B_{i}) P (B_{i}) .$
The first equation is by additivity of measure; second is by definition of conditional probability.

A similar result applies to a countably infinite partition of $Ω$ .

4 Bayes' Formula

Theorem (Bayes' Formula)

Let $B_{1}, \dots, B_{n} \in F$ be a partition of $Ω$ . Then $\forall A \in F$ with $P (A) > 0$ , $P (B_{i} | A) = \frac{P (B_{i} \cap A)}{P (A)} = \frac{P (A | B_{i}) P (B_{i})}{\sum_{j = 1}^{n} P (A | B_{j}) P (B_{j})} .$

Suppose you get tested for a disease and the test result comes back "+". Should you worry?

Suppose disease prevalence is $p$ . "D" is event of having the disease. So $P (D) = p, P (D^{c}) = 1 - p$ . We use $FPR = P (+ | D^{c}), FNR = P (- | D)$ to measure the test "accuracy". Then $\begin{aligned} P (D | +) & = \frac{P (+ | D) P (D)}{P (+ | D) P (D) + P (+ | D^{c}) P (D^{c})} \\ = \frac{(1 - FNR) p}{FPR + p (1 - FNR - FPR)} . \end{aligned}$